********************
READ ME - REPLICATION PACKAGE

How Important is Editorial Gatekeeping?
Krieger, Myers, Stern
********************



********************
REPLICATION INSTRUCTIONS
********************
1. unzip the replication package .zip file "_replication_package.zip", which contains the code and directory structure necessary for execution

2. there are four (4) .zip files in the "./data/raw" directory that have temporarily been given alternative file extensions to prevent the Dataverse from unpacking their contents, given their size. These four files (listed below) currently have ".zip_tobeunzipped" extensions, which should be changed to ".zip" extensions and then unpacked into the "./data/raw" directory.
	- editor_affil.zip_tobeunzipped
	- authority_pmidinfo_dtas.zip_tobeunzipped
	- authority_pmidinfo_csvs.zip_tobeunzipped
	- authority_long.zip_tobeunzipped

3. replace the word "LOCATION" on line 8 of "./code/_master.do" with the location of the un-zipped directory

4. run "./code/_master.do", which executes all necessary scripts and outputs tables and figures to the "./figtab" folder



********************
REPLICATION SOFTWARE / HARDWARE
********************
Last run in 2023, using Stata 17.0, on a Mac with a 3.2Ghz 6-Core Intel i7 and 32GB of RAM. The full replication took approximately 2.5 hours to complete. The necessary packages are installed on lines 23-25 of "./code/_master.do".



********************
DATA SOURCES
********************
1. "authority" - all data within this folder are transformations of the full "Author-ity" dataset generated and maintained by Vetle Torvik (http://abel.lis.illinois.edu). For the full, un-transformed underlying data, please contact Vetle and/or his research group to gain access. We are very grateful for his and his team's work on this dataset.

2. "author_editor_crosswalk.csv" - is a file that links the manually collected editorial names and roles to their respective entries in the "authority" dataset based on sting matches of the names as well as years of activity. 

3. "cleaned_mesh_trees.csv" - is a file that contains both the "term" and "tree" code for all entries in the National Library of Medicine's (NLM) Medical Subject Heading (MeSH) structure. For more on the MeSH tree, see: https://www.nlm.nih.gov/mesh/intro_trees.html. 

4. "editorJournalCosineDistances.dta" - is a file that contains the cosine similarity between each editor and journal in our data based on the tf-idf scores of the MeSH terms contained within the editor's and journal's publications.

5. "journal_list.csv" - is a file containing the journal names (and alternative formats) and NLM ID numbers for the journals in our sample.




********************
FIGURES & TABLES
********************
NOTE: all scripts below require the "00a_databuild.do" and "00b_databuild_affil.do" scripts to be executed first.

Table 1: Summary Statistics -- generated by the "./1_sumstats.do" script
Figure 1: Example MeSH Topic Trends -- generated by the "./1_sumstats.do" script
Table 2: Main Results -- generated by the "./2_main_binreg.do" script
Figure 2: Journal Similarity, Observed and After Simulated “Takeovers” -- generated by the "./6_sim.do" script

